*New* The MotherDuck Native Integration is Live on Vercel Marketplace for Embedded Analytics and Data AppsLearn more

JOIN clause

Back to DuckDB Data Engineering Glossary

Overview

A JOIN clause combines rows from two or more tables based on a related column between them. It's one of the most fundamental operations in SQL, allowing you to bring together data that has been normalized across different tables. Think of it like connecting puzzle pieces - each table holds part of the information, and the join brings those pieces together into a complete picture.

NOTEThis is an overview. If you want to understand the details of how joins work in DuckDB, check out Chapter 3 in the free DuckDB in Action book.

Basic Syntax

The most common join syntax in DuckDB follows this pattern:

Copy code

SELECT * FROM table1 JOIN table2 ON table1.column = table2.column

Join Types in DuckDB SQL

Inner Join (default)

An inner join combines rows from two or more tables based on a matching condition, keeping only the rows where there is a match in both tables. Think of it like a Venn diagram - it returns just the overlapping part.

Copy code

SELECT orders.id, customers.name FROM orders JOIN customers ON orders.customer_id = customers.id

Left Outer Join

A left outer join returns all records from the left table (the first one mentioned) and only matching records from the right table. If there's no match in the right table, the result will contain NULL values for those columns.

The Left Outer Join (LEFT JOIN) is one of the most common types of joins in data analytics.

Copy code

SELECT products.name, reviews.rating FROM products LEFT JOIN reviews ON products.id = reviews.product_id

Right Outer Join

A right outer join is essentially the mirror image of a left outer join - it returns all records from the right table (the second one mentioned) and only matching records from the left table. If there's no match in the left table, NULL values are used for those columns.

Copy code

SELECT employees.name, departments.name FROM employees RIGHT JOIN departments ON employees.dept_id = departments.id

Full Outer Join

A full outer join (also called a full join) combines both left and right outer joins - it returns ALL records from BOTH tables. Where there are matches, it joins the data; where there's no match on either side, it fills in NULL values for the missing data.

Copy code

SELECT students.name, classes.name FROM students FULL OUTER JOIN classes ON students.class_id = classes.id

Cross Join (creates Cartesian product)

A cross join (also called a Cartesian join) creates a result set that combines every row from the first table with every row from the second table. It creates all possible combinations without any matching condition.

Copy code

SELECT employees.name, shifts.time FROM employees CROSS JOIN shifts

DuckDB-Specific Features

DuckDB includes special join types not commonly found in other databases:

ASOF Join

An ASOF (As-Of) join in DuckDB is a special type of join designed for time-series data that matches records based on the closest previous timestamp. It's particularly useful when working with data streams that have different sampling rates or when you need to align events with the most recent prior state.

Copy code

SELECT trades.price, quotes.bid FROM trades ASOF JOIN quotes ON trades.timestamp >= quotes.timestamp

Joining against files and subqueries

Additionally, DuckDB supports joining directly against files and subqueries without needing to create temporary tables first. For example:

Copy code

SELECT t1.*, t2.revenue FROM 'sales.parquet' t1 JOIN (SELECT * FROM 'revenue.csv') t2 ON t1.id = t2.id

Performance Considerations

DuckDB automatically optimizes join operations using various strategies including hash joins and merge joins. Unlike some databases that require explicit join hints, DuckDB's query optimizer typically selects the most efficient join strategy based on the data characteristics and available indexes.